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Epoetin is commonly used to treat anemia in chronic kidney dis¬ 
ease and End Stage Renal Disease subjects undergoing dialysis, how¬ 
ever, there is considerable uncertainty about what level of hemoglobin 
or hematocrit should be targeted in these subjects. In order to address 
this question, we treat epoetin dosing guidelines as a type of dynamic 
treatment regimen. Specifically, we present a methodology for com¬ 
paring the effects of alternative treatment regimens on survival using 
observational data. In randomized trials patients can be assigned to 
follow a specific management guideline, but in observational studies 
subjects can have treatment paths that appear to be adherent to 
multiple regimens at the same time. We present a cloning strategy 
in which each subject contributes follow-up data to each treatment 
regimen to which they are continuously adherent and artificially cen¬ 
sored at first nonadherence. We detail an inverse probability weighted 
log-rank test with a valid asymptotic variance estimate that can be 
used to test survival distributions under two regimens. To compare 
multiple regimens, we propose several marginal structural Cox pro¬ 
portional hazards models with robust variance estimation to account 
for the creation of clones. The methods are illustrated through simula¬ 
tions and applied to an analysis comparing epoetin dosing regimens 
in a cohort of 33,873 adult hemodialysis patients from the United 
States Renal Data System. 

1. Introduction. 

1.1. Epoetin treatment for the correetion of anemia. Erythropoiesis- 
stimulating agents (ESA) are frequently used to correct for anemia (low red 
blood cell counts) in patients with a variety of medical conditions. In partic¬ 
ular, recombinant human erythropoietin (epoetin alfa or, simply, epoetin) 
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has a long history of use in chronic kidney disease (CKD) and End Stage 
Renal Disease (ESRD) subjects undergoing dialysis [Unger et al. (2010)]. 
Initial evidence supported an improved quality of life in subjects whose 
hemoglobin or hematocrit levels rose after treatment with epoetin [Cana¬ 
dian Erythropoietin Study Group (1990), Eschbach (1994)]. In the United 
States treatment for these patients is covered under Medicare and in 2006 
epoetin was identified as the single largest drug expenditure under Medicare 
Part B [United States Government Accountability Office (2006)]. 

Dialysis subjects are given regular injections of epoetin with the dose 
varying over time in response to the subject’s changing hemoglobin or hema¬ 
tocrit levels. Hematocrit is the percentage (%) of red blood cells in blood 
by volume, while hemoglobin (g/dl) is a measure of the oxygen carrying 
hemoglobin protein found in the blood. Both are used as measures of ane¬ 
mia and an approximate conversion between the two is to multiply the 
hemoglobin measure by three. Although epoetin has been in widespread use 
for more than a decade, there is no consensus as to the optimal hemoglobin 
or hematocrit target or dosing algorithm to use in practice. In 2007 the 
National Kidney Eoundation’s Kidney Disease Outcomes Quality Initiatives 
(NKE-K/DOQI) panel updated its recommendations for ESA therapy for 
anemia in Chronic Kidney Disease to suggest that a target hemoglobin range 
of II.0 to 12.0 g/dl (hematocrit 33% to 36%) be used and that hemoglobin 
targets above 13.0 g/dl (hematocrit 39%) not be used [National Kidney 
Eoundation (2006)]. 

Several randomized trials have examined the question of what level of 
hemoglobin or hematocrit should be targeted in order to improve quality 
of life and survival. Besarab et al. (1998) was stopped early when a higher 
risk of death and nonfatal myocardial infarction was observed in dialysis 
subjects treated to achieve a hematocrit of 42% versus those targeted to 
30%. Singh et al. (2006) found an increased risk of a composite endpoint of 
several cardiovascular events and death in chronic kidney disease subjects 
treated to a target hemoglobin levels of 13.5 g/dl versus 11.3 g/dl. Around 
the same time, Driieke et al. (2006) found no significant difference in all-cause 
mortality or death from cardiovascular causes between subjects randomly 
assigned to have their treatment target a normal hemoglobin range of 13.0 
to 15.0 g/dl versus a subnormal range of 10.5 to 11.5 g/dl. More recently, 
Pfeffer et al. (2009) found a nonsignificant increased risk of death or nonfatal 
cardiovascular event in type 2 diabetes subjects with chronic kidney disease 
randomized to a target hemoglobin of 13 g/dl versus those in a placebo 
group treated only to maintain a hemoglobin of about 9.0 g/dl. However, 
there was a significantly higher risk of stroke and thromboembolic events in 
the 13 g/dl group. 

There is also concern that high doses of epoetin may be harmful. Using a 
Cox regression model, Zhang et al. (2004) found that epoetin dose was as¬ 
sociated with increased mortality after adjustment for attained hematocrit 
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level. Brookhart et al. (2010) found a similar association but only among 
subjects with a high achieved hematocrit level. Zhang et al. (2011) found 
that among diabetic patients on dialysis those in the highest epoetin dose 
group had a statistically signihcantly higher risk of experiencing a cardio¬ 
vascular event or death. On the other hand, neither Wang et al. (2010) or 
Miskulin et al. (2013) found evidence of harm or benefit of higher doses. 

Despite the completion of several randomized trials [see additional refer¬ 
ences in Palmer et al. (2010)], there still remains considerable uncertainty in 
the best practice for the treatment of CKD-associated anemia. In particular, 
the optimal target hemoglobin/hematocrit range and epoetin dosing algo¬ 
rithm are unknown [Unger et al. (2010)]. We see this as an opportunity to 
evaluate available observational data to determine what evidence such data 
can provide regarding epoetin dosing strategies in hemodialysis subjects. 

1.2. United States Renal Data System (USRDS) data set. The methods 
described and developed in this manuscript will be applied to a large ob¬ 
servational data set from the United States Renal Data System (USRDS). 
Data is available on 33,873 adult incident ESRD subjects from the year 
2003 with 217,474 total person-months of observation. The annual death 
rate is approximately 15%. Basic demographic characteristics of the analy¬ 
sis cohort are given in Table 1. For each month, the following information 
is available: number of dialysis sessions reported, number of epoetin doses 
recorded, total epoetin dosage (10,000 units), iron supplementation dose. 

Table 1 

Demographic characteristics of 33,873 adult incident End Stage Renal Disease (ESRD) 
subjects from United States Renal Data System (USRDS), 2003 


Characteristic* 

All subjects 
(n = 33,873) 

Male 

(n = 17,389) 

Females 
(n = 16,484) 

Sex 

Male (%) 

51.3 

- 

- 

Age (years) 

66.7 (14.4) 

65.7 (14.8) 

67.7 (14.0) 

BMI (kg/in^) 

27.9 (7.2) 

27.1 (6.3) 

28.8 (8.0) 

Hematocrit (%) 

34.4 (9.8) 

34.3 (10.3) 

34.5 (9.3) 

Race 

White (%) 

63.2 

67.1 

59.2 

Black (%) 

33.4 

29.5 

37.6 

Other (%) 

3.4 

3.5 

3.3 

Gomorbid conditions 

Diabetes (%) 

63.2 

60.0 

66.5 

Hypertension (%) 

83.5 

82.6 

84.5 


‘Gategorical variables are expressed as percentages (%), continuous variables are expressed 
as mean (standard deviation). 
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Hematocrit (%) 

Fig. 1. Proportion of person-months with 25% or greater increase or decrease in epoetin 
dose by current hematocrit (%) level. 

number of days hospitalized and the last hematocrit measurement recorded 
in the month. Dates of baseline hematocrit measurement, first ESRD ser¬ 
vice, first transplant and death are recorded where applicable. Subjects with 
cancer, human immunodeficiency virus (HIV) or acquired immunodehciency 
syndrome (AIDS) were excluded form the analysis cohort. 

Figure 1 shows how the change in epoetin dose is related to the current 
level of hematocrit. Specifically, we plot the proportion of patient-months 
where epoetin dose is either decreased by 25% or more, increased by 25% 
or more, or is maintained within plus or minus 25% of the previous month’s 
dose. Here we see the dynamics of the dose management where subjects with 
lower hematocrit levels are most likely to have their epoetin dose increased 
and those with high hematocrit are most likely to have their dose decreased. 
For patient-months with hematocrit ranging between approximately 32% to 
40%, the most common treatment was to approximately maintain the epo¬ 
etin dosage, suggesting that many physicians guiding treatment considered 
these to be acceptable hematocrit levels. However, there is still considerable 
heterogeneity in treatment changes across all hematocrit levels and we will 
exploit this variation to compare outcomes under various epoetin dosing 
strategies. 

1.3. Dynamic treatment regimens. We consider epoetin dosing strategies 
to be a type of dynamic treatment regimen. A deterministic dynamic treat¬ 
ment regimen is any sequential decision strategy, guideline or rule that de¬ 
fines how a subject’s current treatment depends on their measured covariate 
and possibly treatment histories. In the case of epoetin dosing, a treatment 
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regimen constitutes the target hemoglobin or hematocrit range along with 
rules that dictate how the dose of epoetin should be adjusted over time. 
The specification of candidate treatment guidelines to be studied is a crit¬ 
ical first step in the analysis process. Once the set of possible guidelines 
have been defined, one can retrospectively determine whether or not each 
subject’s treatment was compliant with a particular regimen, and then base 
analyses on those months that were adherent to the regimen under study. 
Since we wish to characterize survival under full compliance to a specific 
dosing guideline, subjects are typically censored at the first visit when their 
treatment trajectory no longer adheres to the regimen under study. 

Currently, there are a limited number of statistical methods that per¬ 
mit direct estimation of the marginal (structural) performance of longitu¬ 
dinal treatment guidelines, and the evaluation of existing methods is quite 
limited with few worked examples and minimal simulation evaluation. An 
extensive review of relevant available methods is found in Chapter 5 of 
Chakraborty and Moodie (2013). Briefly, under appropriate assumptions, 
Inverse Probability of Censoring Weights (IPCW) and Marginal Structural 
Models (MSM) introduced in Robins (1993), Robins, Rotnitzky and Zhao 
(1995) and Robins, Hernan and Brumback (2000) can be used to adjust for 
measured time-dependent confounding and selection bias in observational 
studies. These methods were used in Hernan et al. (2006) to compare sur¬ 
vival under two dynamic treatment regimens for the initiation of highly 
active antiretroviral therapy (HAART) in HIV-infected patients. Further 
analyses have compared multiple candidate CD4 cell count thresholds for 
the initiation of treatment. For example, Orellana, Rotnitzky and Robins 
(2010), Cain et al. (2010, 2011) have introduced methods for comparing 
multiple regimens by creating an artificial data set in which each subject 
contributes observations for each regimen they followed. Recently, Cotton 
and Heagerty (2011) consider dynamic guidelines and a data augmentation 
estimation method, and Shortreed and Moodie (2012) consider quantitative 
outcomes relying on the bootstrap for inference. Robins, Orellana and Rot¬ 
nitzky (2008) also considered a ^r-estimation approach to finding the optimal 
regimen, while Young et al. (2011) focused on analyses using the parametric 
^-formula. 

In this paper we focus on the evaluation of treatment guidelines that tar¬ 
get achieving control of a particular intermediate covariate. The remainder 
of the article is structured as follows: in Section 2 we introduce notation 
and adapt the MSM methods of Cotton and Heagerty (2011) and Orellana, 
Rotnitzky and Robins (2010) to provide a general methodology for the com¬ 
parison of treatment guidelines indexed by a finite parameter, and that map 
the observed dose history and intermediate marker history into a current 
dose assignment. In addition, we introduce a new simple weighted log-rank 
method to test for differences in the population survival distribution that 
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would be realized under alternative dynamic regimens. This test extends 
ideas in Pepe and Couper (1997), Pepe, Heagerty and Whitaker (1999) and 
Zheng and Heagerty (2005). In Section 3 we present a simulation study us¬ 
ing a new data generation structure that permits evaluation of statistical 
methods for evaluation of dynamic guidelines where the structural model 
can be directly determined to satisfy MSM assumptions. We also evaluate 
the use of clustered data sandwich standard errors which are known to be 
valid but potentially conservative when used with a MSM with estimated 
weights. In Section 4 we apply the methods to the USRDS data set of inci¬ 
dent hemodialysis subjects. Our case study extends our previous work [Cot¬ 
ton and Heagerty (2011)] and illustrates the methods using a relatively long 
series of longitudinal data that drives adaptive treatments. As discussed in 
Section 1.1, there is clear medical motivation to study alternative guidelines 
in this setting. Finally, we conclude with a discussion in Section 5. 

2. Methodology. 

2.1. Notation. Let Lj(t) be a vector of possibly time-varying covariates 
collected on the ith subject, i = 1,... ,n, at the tth regularly spaced obser¬ 
vation time, t = 0,1,2,.... Denote the baseline covariates by Vj = Lj(0). 
Let Zi{t) be the treatment (e.g., a drug dosage) prescribed at visit t. It is 
assumed that Zi{t) is determined following the collection of Lj(t) and may 
therefore be influenced by these covariates. Overbars are used to represent 
history up to and including time t so that Lj(t) = {Li(s): s = 0,..., t}. Fi¬ 
nally, let Ti be the event time of interest. Because we are dealing with data 
observed at discrete time points, the exact Tj may not be available, so in¬ 
stead let Di{t) be the indicator of the event occurring in the time period 
{t,t + !]• Assume there is no loss to follow-up and the event is observed in all 
subjects. Later, this assumption can be relaxed by using weighting methods 
similar to those discussed in Section 2.3. 

2.2. Parameterizing the treatment regimen. We will consider regimens 
that specify a range of acceptable treatment values for Zi{t) given a subject’s 
previous treatment value Zi{t — 1) and a single time-varying covariate Li[t)i 
[the first element of the vector Liit)]. For example, the regimen 

' Zi{t-l) X (pi,P2), 
if Li{t)i > 52, 

Zi{t - 1) X (p3,P4), 
if Li{t)i G [5i,52], 

Zi{t-l) X (p5,P6), 
if Li{t)i < bi 


(2.1) Zi{t)\Zi{t-l),Li{t)i,t>^^{ 
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specifies the range of allowable multiplicative changes Zi{t) based on whether 
Li{t)i is above, within or below a target range of ( 61 , 62 ). This type of 
regimen is quite flexible and is fully defined by the set of parameters (pi, 
P 2 , P 3 , Pi, P 5 , P6, 61 , 62 ). The regimen specihcation is easily generalizable to 
additive changes in dose or dependence on multiple time-varying covariates 
from Li(t). 

2.3. Estimation: Creation of clones. For the methods that follow assume 
that it is unknown which, if any, of some K treatment regimens of interest 
a subject was treated under. Depending on the regimen specifications, it 
may be possible for a subject to be adherent to multiple regimens at the 
same time. In order to accommodate this, we propose to clone (or replicate) 
each subject to create K identical copies of their complete treatment and 
covariate history. Let Ljfe(t) = Lj(t) be the copied vector of time-varying co¬ 
variates, Vjfc = Vj be the baseline covariates, Zii^(t) = Zi[t) be the treatment 
dose and let Dik{t) = Di{t) be the event indicator for subject i under regimen 
k. Put another way, this refers to subject i’s /cth clone where k = 1,... ,K. 

Next, we retrospectively determine whether each subject was compliant 
with each treatment regimen. We use the terms compliance and adherence 
interchangeably and let Aik{t) = 1 indicate subject i's adherence to regimen 
k at time t. Otherwise, Aik{t) = 0. Each clone is artificially censored when 
they are no longer adherent with their treatment regimen, and any subject 
with zero adherence time to a specihc regimen will have fewer than K clones 
contribute to the analysis. Let Cik{t) be an indicator of artificial censoring 
for subject i under regimen k at time t. Note that the censoring is fully 
determined through the adherence history A^k (t) and given by Cik (t) = 1 — 
I[Aik(t) = 1], where 1 is a vector of ones the same length as Aik{t). So Cik is 
a vector of zeros followed by ones starting at the first nonadherent visit. Note 
that the use of the subscript A: on Vj^, Dik{t), Zik{t) and Lifc(t) is redundant 
since the cloning process does not alter the event time or follow-up data, as 
it simply defines the artificial censoring time based on adherence. 

The key idea is that the creation of clones with appropriate regimen- 
specihc nonadherence censoring allows us to compare survival under alter¬ 
native dosing strategies. Weights are essential to correct for selection bias 
or for any factors associated with nonadherence to each specific regimen. 
For example, if we only had one regimen of interest, then we would create 
only one parsing of the longitudinal data to reflect observed adherence to 
the regimen under study (e.g., not create multiple clones) but would still 
need to consider weighting for valid inference regarding survival under the 
specihc regimen. 

If Aik{t) = Aii{t) = 1, we say that subject i was coadherent to regimens 
k and I at time t. The concepts of cloning and coadherence are illustrated 
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(a) Adherence under a regimen targeting hematocrit [30%, 36%] 



(b) Adherence under a regimen targeting hematocrit [33% ,39%] 

Fig. 2. Illustration of cloning methodology with hematocrit and epoetin histories for 
one subject from the USRDS data set. Upper panels show monthly hematocrit (%) 
and lower panels show total monthly epoetin dose (10,000 units). In the lower panel 
the shaded grey regions indicate where the epoetin dose would have to fall in or¬ 
der to be compliant with a treatment regimen in the form of equation (2.1) with 
(pi,P2,P3,P4,P5,P6) = (0,0.75,0.75,1.25,1.25,00). 

in Figure 2 with follow-up data from one subject from the USRDS data set. 
Both panels display the same observed hematocrit and epoetin dose his¬ 
tories. The upper and lower panels consider regimens targeting hematocrit 
ranges of [30%,36%] and [33%,39%], respectively. Both regimens specify 
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an allowable multiplicative change in epoetin of plus or minus 25% when 
hematocrit is within the target. For each month, the grey shaded region 
indicates the range of epoetin doses that would have led to compliance with 
the regimen at that month. From the upper panel we see that the subject is 
compliant up to and including month seven. The lower panel indicates com¬ 
pliance to the higher target regimen up to month 11. So for months three 
through seven this subject was coadherent to both regimens. The subject is 
artificially censored at months 8 (upper panel) and 12 (lower panel). 

The artificial censoring has the potential to induce selection bias. For 
example, if subjects with less severe disease, and hence longer survival times, 
are less likely to be censored under a particular regimen, the analysis set will 
be overrepresented by subjects with less severe disease. In an unadjusted 
analysis the effectiveness of the regimen would be overestimated. We use 
stabilized inverse probability weights (IPW) [Robins et al. (1992)] to attempt 
to adjust for this potential selection bias: 

/ N _ TT P[Cikis) = 0|Cifc(s — 1) = 0, V) = Vi] 
swik{ } - U p[Qfc(s) = 0|Qfc(s - 1) = 0,Zi(s) = i)(s)] • 

At each time point, each clone is weighted by the inverse of the proba¬ 
bility that they remained adherent to their treatment regimen given their 
measured covariate history. So adherent clones account for themselves as 
well as other similar subjects who were nonadherent to the regimen and 
therefore artificially censored. The model in the numerator includes only 
baseline covariates and serves to reduce the variability (i.e., stabilize) of 
the weights. This occurs because the probabilities in the numerator and de¬ 
nominator tend to be correlated. Details of the estimation of these weights 
have been well covered in the literature [Robins, Hernan and Brumback 
(2000), Hernan, Brumback and Robins (2001)]. Using these weights creates 
a pseudo-population in which the probability of remaining adherent is inde¬ 
pendent of measured confounders. In order for these methods to be valid, we 
must assume that the baseline and longitudinal information is sufficiently 
predictive of nonadherence to satisfy the assumption of effective sequential 
randomization [Hernan, Brumback and Robins (2001)]. We discuss assump¬ 
tions in detail in the next section. 

The above cloning process induces a unique correlation structure on the 
created clusters of data. Within-clone correlation exists over time due to 
the estimation of weights. The weights within clone sets are expected to 
be more similar over time than the weights between any two of a subject’s 
clones. However, between clone correlation also exists because, if observed 
(i.e., each clone remained uncensored under their respective regimen), both 
clones will have the same time of death. 
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We have induced a form of censoring and explicitly consider weights to ad¬ 
just for this selection bias. However, additional selection bias or confounding 
may exist in an observational data set, and additional work may be needed to 
conduct valid inference. Additional weights can also be included to account 
for censoring due to loss to follow-up or administrative censoring. 

2.3.1. Causal assumptions. Suppressing the subject subscript i, we as¬ 
sume for each possible history a there is a corresponding counterfactual 
event time Ta- In the methods that follow we make the following assump¬ 
tions. First, the sequential randomization or no unmeasured confounders 
assumption states that conditional on the observed covariate and treatment 
history, the treatment a subject received at time t is independent of their 
counterfactual outcomes T^'. 

TaY[A{k)\A{k-l)=dik-l),L{k) = I{k),T>u{k), 

for all histories d{k — 1) and l{k) where u{k) is the time of visit k [Hernan, 
Brumback and Robins (2001), Robins (1998, 2000)]. Next, the positivity 
assumption states that all subjects have a nonzero probability of being ad¬ 
herent to any regimen: 

0 < P[A{k) = l\A{k-l) = a{k - l),L{k) = I{k),T > u{k)] < 1 

with probability 1. Finally, we make the stable unit treatment value assump¬ 
tion that one subject’s potential outcome is not influenced by the treatment 
allocated to other subjects [Rubin (1980)]. 

2.4. Cloned IPW weighted log-rank test. Suppose there are two regimens 
of interest {k = 1,2) and each subject has been cloned as outlined above so 
that for each subject the survival of clone k is considered under regimen k. 
Essentially, this is paired survival data and in order to use a log-rank test, 
adjustments must be made for the correlation between pairs/clones. How¬ 
ever, the IPW also needs to be incorporated to adjust for the selection bias 
induced by the artificial censoring. The cloned IPW weighted log-rank test 
presented here is an extension of the unpaired test described by Xie and Liu 
(2005) and relies on methods from Jung (1999) for calculating the standard 
error of the rank test statistic for paired survival data. The hypothesis to be 
tested is that the cumulative hazard functions are the same under the two 
regimens; 

Hq : Ai(t) = A 2 (t) for all 0 < t < r versus 

Hi: Ai(t) / A 2 (t) for some 0 < t < r, 

where r is the largest time at which both sets of clones have at least one sub¬ 
ject at risk and Ai(t) and A 2 (t) are the true underlying cumulative hazard 
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functions. For snbjects i = 1,... ,n let {Tii,Ti 2 ) be the i.i.d. paired (cloned) 
survival times and (Cji, (7*2), i = 1,..., n be the i.i.d. cloned censoring times. 
Then Xi^ = min(Tjfc,(7*^) is the observed event time and = /(Tj^ < (7*/*) 
is the event indicator for subject i under regimen /c = 1,2. So the full set 
of observed data is given by {(Aji,Xj 2 ,Aji,Aj 2 ), * = Note that 

our unique cloning correlation structure implies that if Aji = A *2 = 1, then 
Til = Ti2 . 

Using standard survival analysis notation, the event process is given by 
= Ajfc/(Ajfc < t), and Nf.{t) = is the total number of 

deaths observed under regimen k at or before time t. The standard at risk 
process is given by Yik{t) = I{t < Xik), so Yfc(t) = Y17=iTik{t) is the total 
number of subjects at risk at time t under regimen k. 

For each regimen assume that the true time-varying subject-specific 
weights are Wik{t) and that consistent estimates swik{t) are available. Now 
define a weighted event process through its derivative as = 

Er=i dN^kit): where 


dXm = w,k{t) dN,kit) = I 


if t = Xik and A*fe 
otherwise 


= T 


and a weighted at risk process as (t) = Yl'i=i (^) where (t) = Wik {t) x 
li^ik < t). Recall the Nelson estimator of the cumulative hazard and define 
a corresponding weighted version: 

'-'dNkis) rdN]:{s) 


Afc(t) = 


/ 


A^(t) = 


n(s) ’ Jo Y,-{s)- 

Jung (1999) provides the details of a log-rank test with correlated survival 
times. With the addition of time-varying snbject-specific weights, a natural 
extension of the standard class of rank statistics is 


W’ 


r 

= V^ 

Jo 


Hit)[dArit)-dA^it)] 


with 


Hit) = 1 




nY;^{t)+Y^{t)' 

The statistic W* is equivalent to the usual form of the log-rank test 
statistic as the sum over time of the difference in the observed number 
of deaths in one group and the expected number of deaths in that group 
under Hq. For discrete time points t = 1,... ,T let dk{t) = ^11=1 ^ikI{J^ik = 
t) be the number of deaths observed in group k at time t and d^(t) = 
'YJi=iWik{T}XikI{Xik = t) be the weighted number of deaths in gronp k at 
time t. Then it can be shown that 

1 ... f d^{t)+di^{t) 


VF* = 


E 

t=i 


dr(t)-Tr(t) 


Y^{t) + Y^{t))\ 








12 


C. A. COTTON AND P. J. HEAGERTY 


In Appendix A in the supplementary material [Cotton and Heagerty 
(2014)] we derive the above form of W*, show that under Hq, W* is asymp¬ 
totically normal with mean 0 and variance ci^, and give a consistent estima¬ 
tor for cj^. 

2.5. Cloned marginal structural Cox proportional hazards models. 

2.5.1. Comparison of two treatment regimens. The usual Cox propor¬ 
tional hazards adherence-based MSM [Hernan, Brumback and Robins (2000, 
2001), Robins and Finkelstein (2000)] can be used with the cloned survival 
data provided that valid standard errors are used to account for the “clone 
clusters.” In the simplest setting of comparing two treatment regimens with 
known regimen membership at baseline, we can specify a proportional haz¬ 
ards marginal association model: 

\T{t\Gi,Wi) = Ao(t) exp(/3iGi -h a'Vj), 

where Ao(t) is an unspecified baseline hazard function, Gi is an indicator 
of regimen assignment and V) is a set of baseline (nontime-varying) covari¬ 
ates. If there were no censoring/nonadherence and regimen membership Gi 
had been randomly assigned, then there would be no confounding and the 
parameter /3i would have a causal interpretation. 

Specihcally, exp(/3i) is the causal hazard ratio comparing the two regi¬ 
mens. Most standard statistical software packages do not allow for the in¬ 
clusion of subject-specific time-varying weights in fitting a Cox model. How¬ 
ever, the model can be fit using weighted pooled logistic regression weighted 
by swi{t) with each subject visit treated as a single observation: 

logit P[D,{t) = l\Di{t-l)= 0, G*, V,] = /3o(t) + l3iGi + a'Vi. 

Here /3o{t) is a time-specihc intercept usually ht as a spline. While this 
yields a consistent estimate of exp(/3i), the estimated standard error may 
be invalid since the estimation of the weights induces a within-subject cor¬ 
relation. In order to overcome this, the model is ht using a Generalized 
Estimating Equations (GEE) approach with working independence [Liang 
and Zeger (1986)]. 

Eor the cloned data setting we proceed in the same manner but treat 
each of the 2n clones as independent observations. Let Gik = I[k = 2] be the 
indicator that the clone is followed under regimen 2. To ht the MSM, each 
clone visit is now treated as a single observation in the logistic model: 

logit P[Dikit) = l\Dikit - 1) = 0, Gik,Vik] = Poit) + IdiGik + a'Vi. 

We assume working independence, but based on results in Lee, Wei and 
Amato (1992) for the Cox model, the estimated regression parameters will 
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still be consistent. A consistent variance estimate can be obtained from 
the standard GEE sandwich covariance estimate if weights are known, and 
will provide conservative standard errors with estimated weights [Hernan, 
Brumback and Robins (2001)]. 

2.5.2. Extension to multiple treatment regimens. Instead of comparing 
just two treatment regimens, suppose there are multiple regimens to be com¬ 
pared simultaneously. Let Git = k, k = 1,2,..., K indicate a clone’s treat¬ 
ment regimen assignment. There are a variety of different Cox proportional 
hazard MSMs that can be considered. In general, let 

(2.2) AT(t|G,fc, V,) = Ao(t)exp[/3(Gifc,t) + a'V,], 

where f3{Gik,t) is a smooth function of both the observation time t and the 
regimen number Gik- Special cases of the above model include assuming a 
linear regimen effect /3{Gik,t) = l3Gik or treating regimen as a factor variable 
with or without interactions with time. A more flexible model would include 
splines in regimen number and/or the effect of time. The interpretation 
of /3 will be as a causal hazard ratio, although the precise interpretation 
will depend on the model specification. We assume that the effect of the 
covariates V is constant across the comparison of any two regimens at any 
times. 

3. Simulation study. A simulation study was undertaken to: (1) illus¬ 
trate the structural models described above in a setting where counterfac- 
tual outcomes satisfy known relationships, ( 2 ) evaluate the performance of 
the point estimation strategy, and (3) evaluate the performance of the sand¬ 
wich standard errors. The first and third of these goals have not been fully 
addressed in the existing literature. 

For each of iL = 6 regimens we simulated = 2500 survival times using 
an exponential distribution with rate parameter A^. This generated contin¬ 
uous simulated survival times Tj for i = 1,..., n = ^ = 15,000 subjects 

each under full adherence to one regimen. We discretize Tj to Di{t) = I[t < 
Ti < t 1], the indicator of death in the next time period. Let k represent 
the regimen under which subject i’s survival time was generated. The ad¬ 
herence indicators for regimen k are A^^(t) = 1 for t = 0,... ,int(Tj), where 
int (Tj) is the largest integer less than Ti. 

Each simulated subject is cloned and their adherence to the K — 1 other 
regimens is simulated based on a set of fixed coadherence probabilities.The 
rate parameters A^ are selected in such a way that the hazard ratios A^/As 
for A: = 2,4,5 are the same for the original raw data and the cloned data. 
Details are given in Appendix B.l in the supplementary material [Cotton and 
Heagerty (2014)]. The data generation method does not guarantee that the 
hazard ratios Ai /A 3 and Ag /A 3 are the same before and after the cloning, so 
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Table 2 

Estimated median hazard ratios (HR), empirieal standard errors (ESE), average 
standard errors (ASE) and empirical 95% confidence interval coverages (ECP) from 
cloning methodology simulations at three levels of induced selection bias (500 replications 

each with nt = 2500 ) 


^ ^ j ^ Clones, unweighted Clones, IPW weighted 


Regimen 

HR 

data HR 

HR 

ESE 

ASE 

ECP 

HR 

ESE 

ASE 

ECP 




No induced selection bias 





2 vs 3 

0.91 

0.90 

0.89 

0.0204 

0.0233 

92.8 

0.89 

0.0221 

0.0232 

90.6 

4 vs 3 

1.17 

1.17 

1.18 

0.0305 

0.0316 

94.8 

1.18 

0.0330 

0.0315 

92.8 

5 vs 3 

1.28 

1.29 

1.31 

0.0341 

0.0352 

90.2 

1.31 

0.0352 

0.0357 

88.2 




Moderate selection bias 





2 vs 3 

0.91 

0.90 

0.89 

0.0206 

0.0230 

94.2 

0.89 

0.0213 

0.0225 

92.4 

4 vs 3 

1.17 

1.18 

1.27 

0.0308 

0.0340 

11.8 

1.17 

0.0301 

0.0314 

94.8 

5 vs 3 

1.28 

1.29 

1.40 

0.0357 

0.0376 

9.2 

1.30 

0.0346 

0.0355 

92.8 




Severe selection bias 





2 vs 3 

0.91 

0.91 

0.90 

0.0217 

0.0227 

94.0 

0.89 

0.0217 

0.0218 

92.6 

4 vs 3 

1.17 

1.18 

1.37 

0.0335 

0.0365 

0.0 

1.17 

0.0302 

0.0314 

96.4 

5 vs 3 

1.28 

1.29 

1.49 

0.0416 

0.0401 

0.0 

1.30 

0.0362 

0.0354 

90.2 


the results for these two regimens are not included. To induce selection bias 
through artificial censoring, a scalar baseline covariate Vi associated with 
both survival time and coadherence (and therefore censoring) is included. 
We consider three levels of selection bias: none, moderate and severe. For 
details, see Appendix B.2. 

The concept of coadherence in the simulation study relates directly to the 
comparison of multiple treatment regimens. Consider two regimens of the 
form of equation (2.1) with overlapping target ranges. At any given month 
a subject is much more likely to be adherent to both these regimens than 
they would be to be adherent to two regimens with nonoverlapping target 
ranges. In addition, any baseline covariate that’s used in the decision of how 
to change treatment in response to changing hematocrit may be associated 
with coadherence of two regimens. 

The aggregated results of 500 simulations are presented in Table 2. Reg¬ 
imen /c = 3 is considered the reference regimen. The first two columns of 
the table present the true underlying hazard ratios A^/As for A: = 2,4,5 and 
the median of the 500 estimated hazard ratios based on the original n simu¬ 
lated fully compliant event times. The agreement between these two columns 
demonstrates that the true underlying hazard ratios can be estimated from 
the data despite the discretization of the event times. 

The next four columns present results from the cloned, unweighted data 
and demonstrate the effect of the selection bias induced through the coad¬ 
herence probabilities described in Appendix B.2. In all scenarios the data 
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was generated without selective nonadherence between regimens 2 and 3 and 
the median estimated hazard ratio and the coverage of the 95% confidence 
intervals just below the nominal level. The coadherence probabilities do in¬ 
duce substantial selection bias between regimens 4 and 3 and regimens 5 and 
3. In the severe selection bias scenario the empirical coverage probability of 
the confidence intervals is zero. 

The hnal four columns of Table 2 give the results of weighting the clones 
by the estimated IPW. A logistic model for adherence given the Vi covariate 
is used for the weights. In all three scenarios the median estimated haz¬ 
ard ratios are very close to the truth. In the moderate and severe selection 
bias scenarios the coverage of the confidence intervals is greatly improved, 
although it is slightly below the nominal level in several cases. In cases with¬ 
out selection bias, we suspect that the lower than nominal coverage rates in 
the IPW analysis are due to instability of inefficiency of the IPW estimates 
after the inclusion of unnecessary weights. The empirical and average stan¬ 
dard errors for the unweighted and weighted clones are comparable. This 
supports the claim that the zero coverage in the unweighted case is due to 
bias in the estimates as opposed to underestimating the variance. 

4. Application to USRDS data set. In this section we apply the cloning 
methodology to a large data set from the USRDS introduced in Section 1.2. 
Analysis begins at month 3 (since the initial dosing strategy is different 
from the maintenance dosing strategy that we wish to study), with up to 9 
months of follow-up data per subject (t = 0,..., 9). 

4.1. Naive analyses. First we use Cox proportional hazards regression 
to conduct naive analyses of the acute association between mortality and 
epoetin dose. The last month’s assigned dose is treated as a time-dependent 
covariate in models adjusted for baseline covariates (age, sex, race, diabetes 
and hypertension). Models are fit both with and without time-dependent 
hematocrit (measured as the average of the last two months’ values). Both 
models yield essentially the same result. The estimated log hazard ratio 
associated with one unit increase in epoetin dose is 0.031 (0.028, 0.034), 
suggesting that higher epoetin doses are associated with an increased risk of 
mortality, as one would expect higher hematocrit levels are associated with 
a reduced risk of mortality [log hazard ratio of —0.035 (—0.032, —0.038)]. 
The naive analysis is difficult to translate into recommendations for epoetin 
dosing strategies since trying to attain a high hematocrit level and a low 
epoetin dose may be incompatible in most patients. 

4.2. Dynamic treatment regimens for epoetin dosing. We consider multi¬ 
ple dynamic treatment regimens for epoetin dose Zi[t) given current hema¬ 
tocrit level Li{t)i and previous dose Zi{t — 1) of the form in equation (2.1), 
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where 


Zi{t)\Zi{t - l),Li{t)i,t> 0 G 


Zi{t-l) X (-oo,-(l-p)), 
if Li{t)i > X — 3, 

Zi{t- 1) X (0.75,1.25), 
if G [x — 3, a; + 3] 

Zi{t - 1) X ((1 +p),oo), 
if Li{t)i < X + 3, 


with X representing the midpoint of the target hematocrit range of (x — 
3, X + 3) and p controlling the allowable multiplicative change in dose out¬ 
side the target range. We consider regiments with x = 31,... ,40 and p = 
0.05,... ,0.50 in 0.05 increments and refer to the regimens by the notation 
G{p,x — 3,x-|-3). The regimen ^(0.25,30,36) is used as the baseline regi¬ 
men for comparison purposes. A target range of [30%,36%] was selected to 
mimic the subnormal targets used in several of the clinical trials referenced 
in Section 1.1. 

The logistic model for the denominator of the IPW includes a spline in 
time along with the baseline covariates gender, age, race and indicators of 
diabetes and hypertension and the time-varying covariates previous month’s 
total epoetin dose and indicators of whether the average of the current and 
previous hematocrit was in the ranges (0,28], (28,32], (36,40], (40, oo), as 
well as the difference between the current and the previous hematocrit. The 
model for the numerator was the same as above but did not include the 
time-varying covariates. Additional weights were calculated for administra¬ 
tive censoring (i.e., clones who were still alive and compliant to their regimen 
at month 12) and censoring due to loss to follow-up (i.e., clones who were 
alive and compliant at a hnal recorded visit occurred prior to month 12). All 
weight models were stratified by regimen. The three stabilized weights were 
multiplied to give the hnal weight used in the analyses. The weight specih- 
cation is key for valid causal inference. All available potential confounders, 
in particular, key variables known to drive changes in epoetin dosing and 
epoetin response [for a summary see Table 1 of Miskulin et al. (2009)], were 
included in the model to guard as best as possible against potential viola¬ 
tion of the no unmeasured eonfounders assumption. However, we did not 
have access to detailed lab data nor do we have clinic data for additional 
adjustment and we acknowledge these potential limitations. 


4.3. Results of cloned IPW weighted log-rank test. The cloned IPW 
weighted log-rank test from Section 2.4 was applied testing the equivalence 
of survival under various regimens with ^(0.25,30,36). The results of a sub¬ 
set of the tests are given in Table 3. We reject the null hypothesis that the 
two survivor functions are equal (p < 0.05) in all cases except the compar¬ 
isons of ^(0.10,30,36) and ^(0.40,30,36) [the two regimens that share a 
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Table 3 

Results of cloned IPW weighted log-rank test, USRDS data, 
p-values for tests of regimens ^( p , a ; — 3 , a ; + 3 ) versus regimen 
0 ( 0 . 25 , 30 , 36 ) 




P 


0.10 

0.25 

0.40 

31 

0.032 

0.001 

0.001 

33 

0.114 

Reference 

0.423 

X 35 

< 0.001 

0.006 

0.077 

37 

< 0.001 

< 0.001 

< 0.001 

39 

< 0.001 

< 0.001 

0.001 


common hematocrit target range with the reference regimen ^(0.25,30,36)] 
and ^(0.40,32,38) (a regimen with a slightly higher target range and requir¬ 
ing more aggressive epoetin dose changes). The results of these tests indicate 
that there are significant differences in survival across possible epoetin dos¬ 
ing regimens. We will proceed with a regression analysis to try and capture 
trends in the survival across regimens. 

4.4. Cloned MSM results. Several models of the form of equation (2.2) 
have been fit to the data. The comparison of target hematocrit ranges is of 
primary interest, so the focus in on comparison of the regimens ^(0.25, x — 
3, X -|- 3). In models where regimen number is considered as a linear variable, 
the regimens are ordered by increasing target range midpoint. In models 
where regimen number is treated as a factor variable, pairwise comparisons 
between the regimen ^(0.25,30,36) and the nine other regimens with p = 
0.25 are considered. 

1. Linear regimen effect. The estimated causal log hazard ratio for a one 
unit increase in regimen number is —0.017 (—0.022, —0.011). This is equiva¬ 
lent to a causal hazard ratio of 0.983 (0.978, 0.989), suggesting that regimens 
that target higher hematocrit ranges provide a small but statistically signif¬ 
icant reduction in mortality. 

2. Regimen treated as a factor variable. The estimated causal hazard ra¬ 
tios comparing each treatment regimen with ^(0.25,30,36) are presented in 
Table 4. These results also suggest that regimens targeting a higher hemat¬ 
ocrit range yield a higher rate of survival. The hazard ratios are significantly 
different from one another for regimens targeting a range of [34%,40%] or 
higher. 

3. Linear regimen effect with a log time interaction. Table 5 gives the fit¬ 
ted causal hazard ratios for a one unit increase in regimen number at each 
observation month. For all months after month 3 (the baseline month) the 
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Table 4 

Estimated hazard ratios from a cloned Cox marginal 
structural model (MSM), regimen treated as a factor variable 
with reference 5(0.25,30,36), USRDS data 


Regimen comparison Hazard ratio 95% Cl 


5(0.25,28,34) 

VS 

5(0.25,30,36) 

1.056 

(0.985, 

1.132) 

5(0.25,29,35) 

vs 

5(0.25,30,36) 

1.027 

(0.959, 

1.101) 

5(0.25,30,36) 

vs 

5(0.25,30,36) 

Reference 



5(0.25,31,37) 

vs 

5(0.25,30,36) 

0.984 

(0.919, 

1.054) 

5(0.25,32,38) 

vs 

5(0.25,30,36) 

0.958 

(0.895, 

1.026) 

5(0.25,33,39) 

vs 

5(0.25,30,36) 

0.940 

(0.878, 

1.006) 

5(0.25,34,40) 

vs 

5(0.25,30,36) 

0.920 

(0.859, 

0.985) 

5(0.25,35,41) 

vs 

5(0.25,30,36) 

0.913 

(0.853, 

0.977) 

5(0.25,36,42) 

vs 

5(0.25,30,36) 

0.915 

(0.854, 

0.980) 

5(0.25,37,43) 

vs 

5(0.25,30,36) 

0.915 

(0.855, 

0.980) 


hazard ratio is statistically significantly less than zero, suggesting that sur¬ 
vival improves as the hematocrit target range midpoint increases. There is a 
trend in decreasing hazard ratio over time, suggesting that at later months 
there is an increased effect of treating with a regimen targeting a higher 
range. 

4. Regimen treated as a factor variable with a log time interaction. Recall 
that in this model within each regimen pair the log hazard ratio is assumed to 
be linear in logf. A plot of the estimated causal log hazard ratios and point- 
wise 95% confidence intervals at month 9 is given in Figure 3. Informally, 


Table 5 

Estimated hazard ratios by month for a one unit increase in 
the midpoint of the target hematocrit range x in regimens 
5(0.25,® — 3,x -|-3), cloned Cox marginal structural model 
(MSM), linear regimen effect with a log time interaction, 
USRDS data 


Month 

Hazard ratio 

95% Cl 

3 

0.999 

(0.993, 1.004) 

4 

0.950 

(0.940, 0.959) 

5 

0.914 

(0.898, 0.930) 

6 

0.885 

(0.864, 0.907) 

7 

0.862 

(0.837, 0.887) 

8 

0.842 

(0.814, 0.871) 

9 

0.825 

(0.794, 0.857) 

10 

0.810 

(0.777, 0.845) 

11 

0.797 

(0.761, 0.834) 

12 

0.785 

(0.747, 0.824) 
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Midpoint of hematocrit (%) target range (x-3,x+3) 


Fig. 3. Estimated log hazard ratios at 9 months for regimens G(0.25,x — 3,1 + 3) versus 
regimen 6(0.25,30,36), cloned Cox marginal structural model (MSM), regimen treated as 
a factor variable with a log time interaction, USRDS data. 

in both graphs an initial downward trend in estimates is seen for regimens 
with higher target ranges followed by possibly a flatter trend at the highest 
target ranges. The causal log hazard ratio for full compliance to the regimen 
^(0.25,34,40) to survival under full compliance to the regimen ^(0.25,30,36) 
was —0.51 (-0.77,-0.25) at month 6 and —0.80 (-1.22,-0.37) at month 9. 

These four models all show the same trend in survival when considering 
the regimens ^(0.25, x — 3, x + 3), x = 31,..., 40 as defined above. In general, 
regimens with target ranges above the reference range of [30%, 36%] provide 
a survival advantage when compared to ^(0.25,30,36). These gains persisted 
with the inclusion of log(month) in the model. It is apparent in Figure 3 that 
while regimens with a higher target range yield a survival advantage, the 
improvement remains relatively constant for regimens with targets at or 
above [34%, 40%]. 

Similar models to those above were fit to the data for regimens, allowing 
varying multiplicative changes in epoetin doses outside the target range. 
The specihc results are not reported here, but in general there were small 
(statistically insignificant) survival advantages for regimens with smaller p, 
that is, those that allowed smaller changes in epoetin dose when hematocrit 
was outside the [30%,36%] target range. 

5. Discussion. In this article we detailed a cloning methodology for com¬ 
paring dynamic treatment regimens when regimen membership is not known 
at baseline. Our goal was to perform an appropriate analysis of motivating 
data from dialysis patients. In order to implement cloning methods, we first 
detailed a simple log-rank test, and then proposed use of clustered survival 
methods for regression inference. In order to evaluate the proposed meth¬ 
ods, we provide a novel simulation scenario that can control the structural 
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parameters. The methods are based on replicating or cloning each subject 
and considering the adherence of each clone’s treatment history to a partic¬ 
ular treatment regimen under consideration. Clones are artificially censored 
at their first nonadherent observation and IPW are used to adjust for the 
induced selection bias. If there are only two treatment regimens under com¬ 
parison, we have shown that a cloned IPW weighted log-rank test can be 
used to test for equality of the survivor functions. The proposed variance 
estimator appropriately adjusts for the correlation within clones. When mul¬ 
tiple treatment regimens are under consideration a Cox proportional hazards 
adherence-based MSM can be used to compare survival under the regimens. 
The structural regression model can take a variety of forms. In particular, 
one can choose to model regimen number as a linear or factor variable and 
choose whether or not to include interactions with time. In all cases, a con¬ 
sistent estimate of the causal hazard ratio is available. 

For epoetin dosing in incident ESRD hemodialysis subjects, we have ap¬ 
plied this methodology to a large USRDS data set to compare survival across 
multiple treatment regimens. As a demonstration, a variety of models were 
fit, but all essentially gave the same conclusion. Subjects tend to experi¬ 
ence lower all-cause mortality when treated under epoetin dosing rules with 
higher hematocrit target ranges. However, there is evidence that there is no 
further gain in survival under regimens with targets above [34%, 40%]. This 
result is scientifically meaningful, especially in light of the uncertainty in 
best practice for the treatment of CKD/ESRD-associated anemia. 

This methodology is appealing because there is no requirement that regi¬ 
mens under consideration be of the same form. In fact, as long as adherence 
can be precisely determined, the treatment regimens can be extremely com¬ 
plex and depend on multiple time-varying covariates or prognostic factors. 
Through cloning, all subjects contribute information to all regimens to which 
they were continuously adherent. However, due to artificial censoring, any 
follow-up after a nonadherent visit is discarded. Further work is warranted 
to explore methods that might overcome or relax this requirement. 

The current methods do not explicitly distinguish between different types 
of adherence (above, within or below target). A possible extension would 
be to include patient status relative to the target in the adherence model. 
Alternatively, it would be possible to consider adherence as a multinomial 
variable and simultaneously model the different types of nonadherence, for 
example, nonadherence due to insufficient increase in dose when the subject 
is below target, insufficient dose decrease when above target, unnecessary 
dose increase when within target or unnecessary dose decrease when within 
target. This would complicate the definition of the stabilized weights but 
warrants further investigation. 
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SUPPLEMENTARY MATERIAL 

Appendices (DOI: 10.1214/14-AOAS774SUPP; .pdf). The supplementary 
material includes Appendix A: Asymptotics of Cloned IPW Weighted Log- 
Rank Test and Appendix B: Simulation Details. 
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